Why Shuffle Feels Broken
Spotify had a famously counterintuitive issue: their shuffle was too random. Users complained about same-artist songs playing back-to-back, or hearing the same track twice in an hour. Statistically expected. Perceptually broken.
I needed the same thing in a different context: sequences of numbers that felt random to a human observer, for experimental stimuli and game design. True randomness is full of clusters, streaks, and coincidences, and our brains are pattern-detection machines that see structure everywhere. Generating sequences that satisfy a human's sense of randomness means actively removing the patterns we're wired to notice.
What Humans Notice
Three pattern types trigger a "this isn't random" reaction:
Repetitions. 3 7 2 3 7, the subsequence 3 7 appears twice. Even at a distance, the brain picks up on repeated fragments. Most jarring of the three.
Mirrors. 1 4 8 4 1 forms a palindrome. Subtler, but detectable even when slightly offset from center.
Monotonic runs. 2 3 4 5 is an ascending run. Three consecutive ascending or descending values already feel non-random, and four in a sequence of ten dominates the perception.
The Penalty System
The algorithm starts with a random sequence and iteratively improves it by swapping elements and checking whether each swap reduces the total pattern penalty.
Each pattern type gets a base penalty that grows exponentially with length:
penalty = ((H^s) - 1) / Ω
H is the base (tuned per pattern type), s is the pattern length, Ω is a distance-based mitigation factor. The base values encode perceptual salience:
- Repetitions: H = 1.8 (harshest)
- Mirrors: H = 1.3
- Ascending/descending runs: H = 1.1
Distance mitigation follows a power law, Ω = distance^1.5, so nearby patterns get penalized at near-full strength while distant ones are gradually forgiven.
Detection leans on NumPy's diff: first derivative exposes runs (constant derivative = monotonic), and differences between subsequences at various offsets reveal repetitions and mirrors. For mirrors, I compare the sequence with its reverse at every possible center point.
The search runs for series_length × 100 × 10/items iterations. For a typical 30-element sequence with 10 possible values, that's about 30,000 swaps evaluated.
My first attempt tried to build sequences element-by-element, checking constraints as it went. It kept painting itself into corners, impossible states where no valid next element existed. The swap-and-evaluate approach is less elegant but far more robust.
Results
The output sequences don't look random in the mathematical sense, a chi-squared test would probably flag them for non-uniform transition statistics. But they feel random. No element predicts the next. No fragment echoes another.
I tested this informally by showing people pairs: one truly random sequence, one generated by the algorithm. Most people rated the algorithm's output as "more random." The truly random sequences kept getting flagged for suspicious patterns that were, of course, just coincidence.

The penalty framework (exponential growth with distance mitigation, three tunable bases and one exponent) ended up being more flexible than I expected. The same ((H^s) - 1) / Ω structure could work for rhythm generation, color sequences, or procedural level layout: anywhere humans need to perceive randomness without tolerating the real thing.